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ABSTRACT. This study is dedicated to precise distributional analyses of the height of 
non-plane unlabelled binary trees ("Otter trees"), when trees of a given size are taken with 
equal likelihood. The height of a rooted tree of size n is proved to admit a limiting theta 
distribution, both in a central and local sense, and obey moderate as well as large deviations 
estimates. The approximations obtained for height also yield the limiting distribution of 
the diameter of unrooted trees. The proofs rely on a precise analysis, in the complex plane 
and near singularities, of generating functions associated with trees of bounded height. 

Introduction 

We consider trees that are binary, non-plane, unlabelled, and rooted; that is, a tree 
is taken in the graph-theoretic sense and it has nodes of (out)degree two or zero only; a 
special node is distinguished, the root, which has degree two. In this model, the nodes 
are indistinguishable, and no order is assumed between the neighbours of a node. Let y 
denote the class of such trees, and let y n be the subset consisting of trees with n external 
nodes (i.e., nodes of degree zero). In this article, we study the (random) height H n of a 
tree sampled uniformly from y n . 

Most of the results concerning random trees of fixed size are relative to the situation 
where one can distinguish the neighbours of a node, either by their labels (labelled trees), 
or by the order induced on the progeny through an embedding in the plane (plane trees); 
see the reference books [14, 19] and the discussion by Aldous [3] who globally refers to 
these as "ordered trees" . In this range of models, Meir and Moon [33] determined that the 
depth of nodes is typically O(^fn) for all "simple varieties" of trees, which are determined 
by restricting in an arbitrary way the collection of allowed node degrees. Regarding height, 
a few special cases were studied early: Renyi and Szekeres [38] proved in particular that 
the average height of labelled non-plane trees of size n is asymptotic to De Bruijn, 

Knuth, and Rice [12] dealt with plane trees and showed that the average height is equivalent 
to \pixn as n —> oo. Eventually, Flajolet and Odlyzko [18] developed an approach for 
height that encompasses all simple varieties of trees; see also [20] for additional results. 

Under such models with distinguishable neighbourhoods, trees of a fixed size n may be 
seen as Galton-Watson processes (branching processes) conditioned on the size being n, 
see [1, 26, 28], and there are natural random walks associated to various tree traversals. Ac- 
cordingly, probabilistic techniques have been successfully applied to quantify tree height 
and width [8, 9], based on Brownian excursion. An important probabilistic approach con- 
sists in establishing the existence of a continuous limit of suitably rescaled random trees 
of increasing sizes — one can then read off, to first asymptotic order at least, some of the 
limit parameters directly on the limiting object. The latter point of view has been adopted 
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by Aldous [2, 3, 4] in his definition of the continuum random tree (CRT): see the survey 
by Le Gall [30] for a recent account of probabilistic developments along these lines. 

The case of trees (as are considered here) that have indistinguishable neighbourhoods 
is essentially different. Such trees cannot be generated by a branching process conditioned 
by size and no direct random walk approach appears to be possible, due to the inherent 
presence of symmetries. (An analysis of such symmetries otherwise occurs in the recent 
article [6].) The analysis of unlabelled non-plane trees finds its origins in the works of 
Pdlya [36] and Otter [35]. However, these authors mostly focused on enumeration — the 
problem of characterizing typical parameters of these random trees remained largely un- 
touched. Recently, in an independent study, Drmota and Gittenberger [15] have examined 
the profile of "general " trees (where all degrees are allowed) and shown that the joint 
distribution of the number of nodes at a finite number of levels converges weakly to the 
finite dimensional distribution of Brownian excursion local times. They further extended 
the result to a convergence of the entire profile to the Brownian excursion local time. 

The foregoing discussion suggests that, although there is no clear exact reduction of un- 
labelled non-plane trees to random walks, such trees largely behave like simply generated 
families of ordered trees. In particular, it suggests that the rescaled height H n /^fn is likely 
to admit a limit distribution of the theta-function type [16, 18, 27, 38]. We shall prove that 
such is indeed the case for non-plane binary trees in Theorems 1 and 2 below. We also 
provide moderate and large deviations estimates (Theorems 4 and 5), as well as asymp- 
totic estimates for moments (Theorem 3), see §5. Equipped with solid analytic estimates 
regarding height, we can then proceed to characterize the diameter of unrooted trees in §6, 
this both in a local and central form (Theorems 6 and 7). Some a posteriori observations 
that complete the picture are offered in our Conclusion section, §7. 

A preliminary investigation of the distribution of height in rooted trees is reported in the 
extended abstract [7]. Our interest in this range of problems initially arose from questions 
of Jean-Francois Marckert and Gregory Miermont [32], in their endeavour to extend the 
probabilistic methods of Aldous to non-plane trees and develop corresponding continuous 
models — we are indebted to them for being at the origin of the present study. 



1 . Trees and generating functions 

Tree enumeration. Our approach is entirely based on generating functions. The class y 
of (non-plane, unlabelled, rooted) binary trees is defined to include the tree with a single 
external node. A tree has size n if it has n external nodes, hence n — 1 internal nodes. 
The cardinality of the subclass y n of trees of size n is denoted by y n and the generating 
function (GF) of y is 

y (z) := Vnz n = z + z 2 + z 3 + 2z 4 + 3z 5 + 6z 6 + llz 7 + 23z 8 + • • • , 

n>l 

the coefficients corresponding to the entry A001190 of Sloane's On-line Encyclopedia of 
Integer Sequences. The trees of y with size at most 6 are shown in Figure 1. 

A binary tree is either an external node or a root appended to an unordered pair of two 
(not necessarily distinct) binary trees. In the language of analytic combinatorics [19], this 
corresponds to the (recursive) specification 



y = Z + MSet 2 {y), 
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Figure 1. The binary unlabelled trees of size less than six. 



where Z represents a generic atom (of size 1) and MSet 2 forms multisets of two elements. 
The basic functional equation 



(1) 



y( z ) = z + \y{ z ? + \y( z2 )> 



closely related to the early works of Pdlya (1937; see [36, 37], and first studied by Ot- 
ter (1948; see [35]), follows from fundamental principles of combinatorial enumeration [19, 
25]. The term ^y(z 2 ) accounts for potential symmetries — hereafter, we refer to such terms 
involving functions of z 2 , z 3 , . . . , as Pdlya terms. According to the general theory of ana- 
lytic combinatorics, we shall operate in an essential manner with properties of generating 
functions in the complex plane. The following lemma is classical but we sketch a proof, as 
its ingredients are needed throughout our work. 

Lemma 1 (Otter [35]). Let p be the radius of convergence of y(z). Then, one has 1/4 < 



Up 2 



As z — > p , the generating 



p < 1/2, and p is determined implicitly by p 
function y{z) satisfies 

(2) y(z) = 1 - A yjl - z/p + O (1 - z/p) , X ~ \/2p + 2p 2 y'(p 2 ). 

Furthermore, the number y n of trees of size n satisfies asymptotically 

A _Q /O / „ / 1 



(3) 



n' 3 / 2 p- 



1 + 



Proof. The number of plane binary trees with n external nodes is given by the Catalan 

= '( 

between 1 and 2™ 1 , one has the bounds 



number C„_i = ^( n _ x ). The number of symmetries in a tree of size n being a priori 



Gn-xl 1 ™ < y n < C n -\. 

As it is well known, the Catalan numbers satisfy C n ~ that the radius 

of convergence p satisfies the bounds 1/4 < p < 1/2. It follows that y(z 2 ) is analytic 
in a disc of radius ^fp, which properly contains {\z\ < p}. Then, from (1), upon solving 
for y{z), we obtain 



(4) 



y(z) = 1 - y/l-2z-y(z*), 



which can only become singular when the argument of the square root vanishes. By 
Pringsheim's Theorem [19, p. 240], the value p is then the smallest positive solution of 
2z + y(z 2 ) = 1, corresponding to a simple root, and, at this point, we must have y(p) = 1, 
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(7) 



given (4). This reasoning also justifies the singular expansion (2), which is seen to be valid 
in a A-domain [19, §VI.3], i.e., a domain of the form 

(5) {z : \z\<p + e,z^p,\ arg(z - p)\ > 6} e, 6 > 

that extends beyond the disc of convergence \z\ < p. 

Equation (3) constitutes Otter's celebrated estimate: it results from translating the square 
root singularity of y(z) by means of either Darboux's method [25, 35, 36] or singularity 
analysis [19]. □ 

Numerically, one finds [17, 19, 35]: 

p = 0.40269 750367, A = 1.13003 37163, — = = 0.31877 66259. 

21/71" 

Height. In a tree, height is defined as the maximum number of edges along branches 
connecting the root to an external node. Let yh, n be the number of trees of size n and 
height at most h and let yh{z) — Yln>l Vh.nZ n be the corresponding generating function. 
The arguments leading to (1) yield the fundamental recurrence 

(6) Vh+i(z) = z+ -y h (z) 2 + \vh{z 2 ), h>0, 
with initial value yo(z) = z, and 

yi(z) = z + z 2 , y 2 (z) = z + z 2 + z 3 + z 4 , 
y 3 ( z ) = Z + z 2 + z 3 + 2z 4 + 2z 5 + 2z G + z 7 . 

A central role in what follows is played by the generating function of trees with height 
exceeding h: 

e h(z) = ^2 &h,nz n := y(z) - yh{z), 

n>l 

Then, a trite calculation shows that the e^(z) satisfy the main recurrence 

(8) e ^ 1 {z)=y{z)e h {z)(l-^ y j+^p-, e (z) = y(z) - z, 

on which our subsequent treatment of height is entirely based. 
Analysis. The distribution of height is accessible by 

(9) F{H n >h}= Vn ~ yn ' h = 

Vn Vn 

where eh.n — [z n ]eh{z). Lemma 1 provides an estimate for y n , and we shall get a handle 
on the asymptotic properties of e/, n by means of Cauchy's coefficient formula, 

1 f dz 

(10) en > h= 2i^J 7 ehiz) z^' 

upon choosing a suitable integration contour 7 in (10), of the form commonly used in sin- 
gularity analysis theory [19]; see Figure 2 below. This task necessitates first developping 
suitable estimates of e^(z), for values of z both inside and outside of the disc of conver- 
gence \z\ < p. Precisely, we shall need estimates valid in a "tube" around an arc of the 
circle \z\ = p, as well as inside a "sandclock" anchored at p (see Figure 2). 

Definition 1. The "tube" T(p, rj) of width p and angle r) is defined as 

(11) T(p,v) ■= {z ■ -p < \z\ - p < fi, I arg(z)| > 77}. 
The "sandclock" of radius r and angle 8q anchored at p is defined as 

(12) S{r a ,9 ) :={z: \z-p\<r , tt/2 - 9 < \ arg(z - p)\ < tt/2 + O }. 
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tube 




Figure 2. Left: the "tube" and "sandclock" regions. Right: the Hankel contour used to estimate en, 
(details are given in Figure 3). 



Strategy and overview of the results. Estimates of the sequence of generating func- 
tions (eh(z)) within the disc of convergence and a tube, where z stays away from the 
singularity p, are comparatively easy: they form the subject of Section 2. In particular, 
Proposition 1 states that we can always find thinner and thinner tubes that come arbitrarily 
close to the singularity p and where the convergence — > y, — > is ensured. The bulk 
of the technical work is relative to the sandclock, in Section 3, where Proposition 2 grants 
us the existence of a suitable sandclock for convergence. We can then develop in Section 4 
our main approximation: 

(13) e h (z) = y(z) - y h (z) « 2^\y h . 

Here, the symbol is to be loosely interpreted in the sense of "approximately equal" ; a 
formal statement is postponed and summarized in Proposition 3. 

The form of the approximation in (13) is similar to that in the original paper by Flajolet 
and Odlyzko [18] where trees are ordered. Its justification ranges in Sections 2-4, which 
closely follow the general strategy in [18]; however, nontrivial adaptations are needed, due 
to the presence of Polya terms, so that the problem is no longer of a "pure" iteration type. 

We then reap the crop in Section 5. There, we use (9), the approximation in (13) and the 
square root singularity of y at p to prove the following theorem relative to the distribution 
of height H n : 

Theorem 1 (Limit law of height). The height H n of a random tree taken uniformly from 
y n admits a limiting theta distribution: for any fixed x > 0, there holds 

lim ¥(H n > \~ 1 x^/n) = Q(x), A := y/2p + 2p 2 y'(p 2 ), 
where 0(x) := ^{k 2 x 2 - ly-^* 2 ^. 

k>l 

Our formal version of approximation in (13) (Proposition 3) is also strong enough to 
grant us access to a limit law for the height H n : 
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Theorem 2 (Local limit law of height). The distribution of the height H n of a random 
tree taken uniformly from y n admits a local limit: for x in a compact set of K >0 and 
h = \ x-\/n an integer, there holds uniformly 

¥(H n =h)~ -$=#{x), 
where 4(x) = -&{x) = (2a;)- 1 ^(fcV - 6fc V^^ 2 / 4 . 

k>l 

Note that the results above appear to parallel the weak limit theorem and and local limit 
laws known in the planar case [20]. Further theorems about the asymptotics of (integer) 
moments of H n , together with moderate and large deviations may also be extracted from 

(13) ; we only state the one for the moments, the others may be found in Section 5. 

Theorem 3 (Moments of height). Let r > 1. The rth moment of height H n satisfies 

2 /2\ r 

(14) E [H n ] ~ - y/im and E[H'] ~ r(r - l)C(r)r(r/2) ( - J n r/2 , r > 2. 

Finally, in Section 6, we analyse the diameter of unrooted trees using a reduction to the 
rooted tree case. There, we provide theorems similar to Theorem 1, 2 and 3, i.e., a weak 
limit theorem, a local limit law, and asymptotics for the moments. The precise definition of 
the model of unrooted trees, and the statement of the results are postponed until Section 6. 

2. Convergence away from the singularity in tubes 

Our aim in this section 1 is to extend the domain where is analytic beyond the disc of 
convergence \z\ < p, when z stays in a "tube" T(p, rf) as defined in (11) and is thus away 
from p. The main result is summarized by Proposition 1, at the end of this section. Its 
proof relies on the combination of two ingredients: first, the fact, expressed by Lemma 2, 
that the converge to 0, equivalently, — > y, in the closed disc of radius p (this property 
is the consequence of the ?i~ 3 / 2 subexponential factor in the asymptotic form of y n , which 
implies convergence of y(p)); second, a general criterion for convergence of the e/j to 0, 
which is expressed by Lemma 3. The criterion implies in essence that the convergence do- 
main is an open set, and this fact provides the basic analytic continuation of the generating 
functions of interest. 

Lemma 2. For all z such that \z\ < p, and h > 1, one has 

K( ' )ls 7s(V 

Proof. To have height at least h, a tree needs at least h + 1 nodes, so that |e/j(z)| < 
Sn>/i We first note an easy numerical refinement of (3), namely, y n < \p^ n -nT ? 'l' 1 , 

obtained by combining the first few exact values of y n with the asymptotic estimate (3). 
(See [22] for a detailed proof strategy in the case of a similar but harder problem.) This 
implies 

i M k 1 fi^'V 1 < 1 ^iV l r dt f\ z \\ h i 

x r y n>h 

and the statement results. L 



2 V P J fh nm ~ 2 V p / Jh tV2 V p / v^' 



'in what follows, we freely omit the arguments of y(z), e^(z), ;//, (z) . . . , whenever they are taken at z. (We 
reserve h for height and n for size, so that no ambiguity should arise: means yh(z), whereas y n invariably 
represents [z n ]y(z).) 
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We now devise a criterion for the convergence of e^(z) to zero. This criterion, adapted 
from [18, Lemma 1], is crucial in obtaining extended convergence regions, both near the 
circle \z\ = p (in this section) and near the singularity p (in Section 3). 

Lemma 3 (Convergence criterion). Define the domain 2 

(15) V := {z : \y(z)\ < 1}. 

Assume that z satisfies the conditions z € T> and \z\ < ^fp. The sequence {|e^(z)|, h > 0} 
converges to if and only if there exist an integer m > 1 and real numbers a, (3 € (0, 1), 
such that the following three conditions are simultaneously met: 

(I I 2 \ m 
) < °" 

Furthermore, if (16) holds then, for some constant C and (5q £ (0, 1), one has the geomet- 
ric convergence 

(17) \e h \<ChPit, 
for all h > m. 

Proof, (i) Convergence implies that (16) is satisfied, for some m. Assume that z G T>, 
\z\ < J~p, and eh(z) — > as h — > oo. Then choose j3 such that \y\ < (3 < 1. This gives 
a possible value for a, say, a = {fi — Choose to suc h that, for all p > m , one has 
|e M | < a; then choose mi large enough, so that the third condition of (16) is satisfied. The 
three conditions of (16) are now satisfied by taking m = max(m , mi). 

(ii) Condition (16) implies convergence and the bound (17). Conversely, assume the 
three conditions in (16), for some value to. Then, they also hold for m + 1. Indeed, 
recalling (8), we see that, for any h > 1, 

as) ieM.il < m(m + ^) + ^ < \^\(}y\ + ^)^f h 

where the Polya term involving |eh(2: 2 )| has been bounded using Lemma 2. The hypothe- 
ses of (16) together with (18) above taken at h — to, yield the inequality |e m+ i | < a. So, 
once the conditions (16) hold for some m, they hold for to + 1; hence, for all h > m. 

The fact that, under these conditions, there is convergence, —> 0, now results from 
unfolding the recurrence (8): we find, for all h > to, 

h—m /i |2\ h—i f / • . 9 \ \ h 

\e h+ i\ < P h - m+1 \e m \ + J2 ^(^y) ^ P h - m+1 \e m \+hmBx\p, 

where Lemma 2 has been used again to bound the Polya term. The additional assertion 
that |e/,.| < C/i/3q in (17) finally follows from choosing /3q := max(/3, \z\ 2 / p). □ 

We can now state the main convergence result of this section: 

Proposition 1 (Convergence in "tubes"). For any angle r\ > 0, there exists a tube T(/i, if) 
with width p > 0, such that \eh(z)\ — > 0, as h — > oo, uniformly for z in T(p, r/). 

Proof. We thus start from a fixed rj, assumed to be suitably small. If we exclude a small 
sector of opening angle 2r/ around the positive real axis, then the quantity, 

A := sup {|2/0)|; \z\ = p, \ arg(z)| > 77} , 

satisfies Ao < 1: this results from the strong triangle inequality (see also the "Daffodil 
Lemma" of [19]) and the fact that y(pe l9 ) is a continuous function of 0. (By the argument 



"This domain will sometimes be referred to as the "cardioid-like" domain, as it contains the {\z\ < p} 
punctured at p (Proposition 1) and has a cusp at z = p, associated to the square root singularity of y(z) at p. 



s 



NICOLAS BROUTIN AND PHILIPPE FLAJOLET 



introduced in the proof of Lemma 1, the function y{z) is analytic at all points of \z\ = p, 
z 7^ p, hence continuous.) Fix then e by A = 1 — 2e. By continuity of y again, for 
each z on the circle of radius p satisfying | arg(z) | > r\, there exists a small open disc S(z), 
centred at z and such that \y(()\ < 1 — e for all £ G S(z). From now on, we assume that 
the discs 8(z) are taken small enough, so that they are entirely contained in the larger disc 
{w G C : M < Vp}. 

We can then make use of the convergence criterion of Lemma 3, supplemented once 
more by a continuity argument. In the notations of (16), choose first a — e, then f3 = 
1 — e/2. For all sufficiently large m, say m > v, the last two conditions of (16) are 
satisfied. Then, since the et{z) are analytic (hence continuous) at every point of the unit 
circle punctured at p, there exists, around each z on \z\ = p with | arg(z)| > r\, a small 
open disc 8\(z) C 8(z) and an integer M(z) such that \e m \ < a for all m > M(z). We 
may also freely assume that M(z) > v. 

Finally, by compactness of the arc {pe } defined by \9\ > rj, there exists a covering 
of the arc by a finite collection of small discs, say {6i(zj)}^ =1 . The union of these small 
discs must then contain a tube of angle 77 and width p > 0. By design, in this tube, all three 
conditions of the convergence criterion of Lemma 3 (Equation (16)) are now satisfied, with 
m = maxj =1 M(zj). □ 



3. Convergence near the singularity in a sandclock 



We now focus on the behaviour of eh(z) in a "sandclock" around the singularity. When 
z approaches p, the quantity \y \ is no longer bounded away from 1, so that the criterion for 
convergence obtained earlier (Lemma 3) cannot be used directly. We then need to proceed 
in two stages: first, we prove in Subsection 3.1 that, in a suitable sandclock, the initial terms 
decay "enough"; next, in Subsection 3.2, we establish the existence of a sandclock where 
convergence of the eh to is ensured — this is expressed by the main Proposition 2 below. 
We shall then be able to build upon these results in the next section and derive suitable 
singular approximations of the e/j outside of the original disc of convergence \z\ < p 
of y(z), when z is near p. 

Alternative recurrence. So far, we have operated with the main recurrence (8) relating 
the eh, then applied some partial unfolding supplemented by simple continuity arguments. 
To proceed with our programme, we need to adapt a classical technique in the study of 
slowly convergent iterations near an indifferent fixed point [11, p. 153], which simply 
amounts to "taking inverses" and leads to a useful alternative form of the original recur- 
rence. 

Lemma 4 (Alternative recurrence). Assume, for a value z, the conditions 

ei(z) ^ and &i{z) [l — ei(z 2 ) /ei(z) 2 ~\ ^ 2y(z), for i = 0, . . . , h — 1. 
Then, the following two recurrence relations hold 



(19) 



(20) 



eh 
eh 



— — V 



y 



1 



1 



e%_ 
2y 



1 - 



e l {z>) 



1 



1 l-y h 



eo 2y 1 - y 



h-l 

E 

i=0 



2e? 



h-l 

— Y 



y'ei 


"1 

ef 


2 


1 _ et 
2y 


1 6 *(f) 
ef 



The form (19) is referred to as the simplified alternative recurrence; the form (20) is the 
extended alternative recurrence. 
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Proof. Starting with the recurrence relation (8), rewritten as 



y 



( 1l 
2y 



e l {z 2 ) 



the trick is to take inverses (cf also [18]). The identity (1 



e-i+i 







r, e 4 (z 2 )- 


) 


[ . 


V 2y 


2 





it(l — it) 1 implies 
-l 



Summing the terms of this equality for i = 0, . . 
extended version follows from the expansion (1 



, h — 1 then yields the first version. The 



1 



M 2 (l 



□ 



Lemma 4 is used to complete the proof of Lemma 5 below (see Equation (31)) and it 
serves as the starting point of the proof of Proposition 2 (see Equation (33)). It then proves 
central in establishing the main approximation of Proposition 3 in the next section. The 
interest of these alternative recurrences is that they relate the inverse 1/e/j to essentially 
polynomial forms in the previous a. In particular they serve to convert lower bounds into 
upper bounds, and vice versa. 

3.1. Initial behaviour of e/,. We establish in this subsection (cf Lemma 6) that the quan- 
tities |e/j(;z)| first exhibit a decreasing behaviour for h < N, with some appropriate 
N = N(z). At that point, |ejv (^) | appears to be small enough to guarantee that the crite- 
rion of Lemma 3 becomes applicable, whence eventually the convergence |e^(z)| — > as 
h — >• oo in a sandclock. 

The following preparatory lemma serves to control the effect of Polya terms, when z is 
close to p, so that z 2 is close to p 2 , well inside of the disc of radius p. It is evocative of the 
theory of iteration near an attractive fixed point (see, e.g., [34, Ch. 8]). 

Lemma 5 (Smooth iteration for Polya terms). Fix zq € (0, p). There exists a constant 
Rq > 0, dependent upon z , such that, for all h > 0, and for all z satisfying \ z — z | < R , 
one has 

e h (z) = C h (z)-y(z) h ) 
where, uniformly with respect to z, Ch{z) = C(z) +o(l), as h oo, and C(z) is analytic 
at zq. Furthermore, for some K\ , Ki , Co all positive, one has 3 , in the disc \ z — zq \ < Rq, 

Ki < \C h {z)\ < K 2 and \ arg(e h (»)| < c {h + l)\z - z \. 

Proof. Starting from the main relation (8) and unfolding only the that is factored, we 
obtain by induction 



(21) 



efe+i 

h+l 



V 



h 

eo n 

i=0 



1 - 



2/y 



e h (z 2 ) 

2 yh + l 



h-l 



edz 2 ) 

yi 



n 



2y 



We let Ch{z) :— ei l {z)/y(z) h and proceed to prove properties of these quantities. 

(i) Upper bound on C\ and existence of C(z). When z lies in a small enough neigh- 
bourhood of zq G (0, p), the convergence of to zero is geometric by Lemma 2, and it 
remains so, uniformly with respect to z restricted to a small neighbourhood of Zq. Further- 
more, the inequality \y{z)\ > \z\, which holds at z = z , persists, by continuity, for z in 
a suitably small neighbourhood of zq. It follows that both the product and the sum in the 
right-hand side of (21) converge geometrically and uniformly, so that Ch(z) — > C(z) as 
h — > oo, where C(z) is analytic at zq. These arguments also imply that |CV(z)| remains 
bounded from above by an absolute constant: jC^z)! < K 2 . 



' The argument of a complex number w ^ taken to be the number 9 £ (— tt, +7t] such that w = \w\e l6 . 
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(ii) Lower bound on Ch- We next observe that, in a small enough neighbourhood 
of Zq, the quantity |C(z)| must be bounded from below. Indeed, a contrario, if this was 
not the case, then we would need to have C(zq) = 0. Now, because of the convergence 
of Ch{z) to C(z), we would have Ch(zo) = o(l), implying e^(zo) = o(y(zo) h ). This last 
fact is finally seen to contradict Equation (21), since the left hand side taken at z would 
tend to 0, while the right hand side remains bounded from below by the positive quantity 
e o E[fc:o(l — e «/(2j/)) taken at zq. A contradiction has been reached. Thus, we must have 
|C(z)| > K* for some K\ > 0; hence the claimed inequality |Cft(z)| > K\ for all h large 
enough, say h > h a . (For h < h , we can complete the argument by referring again to 
Equation (21), which precludes the possibility that eh(C) = for £ 6 (0, p). A continuity 
argument then provides a small domain around zq where Cj (z) is bounded from below, for 

allje{l,...,/i }.) 

(Hi) Bound on the argument. Finally, the argument of can be expressed as follows: 

(22) aige h = 3(loge h ) = ^(\ogC h ) + /i3(logy) (mod 2tt). 

We now consider a disc \z — z \ < R and momentarily examine the effect of letting 
R — > 0. By analyticity of y(z) at zq and since y(zo) is positive real, we have 3(log y(z)) = 
0(R). Next, since |C/j(z)| is bounded from above and below in a small enough fixed 
neighbourhood of z , Ch(zo) is positive real, and Ch{z) — > C(z), we have, similarly, 
3(log Ch(z)) — O(R), where the implied constant in O(-) can be taken independent of h. 
This means that, there exist constants do,dt > such that, provided R is chosen small 
enough, one has | arg(eh(z))| < d^R + d\Rh. This last form implies the stated bound on 
the argument of e^. □ 

With Lemma 5 in hand, we can obtain a first set of properties of e^(z), which hold for 
z in a sandclock S(ro, 9q) and for h "not too large". These will be used in Proposition 2 to 
derive an upper bound on | ejv | (for some suitably chosen N depending on z), to the effect 
that 6 7v eventually satisfies the criterion of Lemma 3. In the following, we only need to 
consider z E 6>(ro,#o)> with > 0, since we clearly have e/j(z) = e^(z), where z 
denotes the complex conjugate of z. 

Lemma 6 (Initial behaviour of e^). Suppose > 0. Define the integer 

(23) N(z):= arCC ° s( X 4) 

_ arg y(z) _ 

Fix 9o < |, with 9q > 0. There exists a constant tq > such that, if z lies in the sandclock 
5(ro, #o)> then, for all h such that 1 < h < N(z), the following inequalities hold: 

(24) 2(h + l) < |e,l(z)l K 1/2 md ° " arg(e,i) " ^ + 2) arg(2/) - 
Furthermore, one has \ef l (z)\ < 1/5, for 6 < h < N(z). 

Observe that we can also assume, in a small enough sandclock, 

(25) \ < \e (z)\ < 2 -, 

since e (p) = 1 — p has numerical value = 0.59730 and eo(z) is continuous at z = p. 

Proof. As a preamble, we note that N(z) tends to infinity as z — > p, since y(p) = 1 is real, 
hence has argument 0. Consider next the basic recurrence relation (8) rewritten as 

e h+i e h ( e h /y\ e h {z 2 ) 

(26) = y 1 



y V \ 2 J 2y 
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The behaviour of the first term in the right-hand side of (26) is dictated by properties of the 
mapping 

(27) g : v)h+w(l- w/2). 

(A very similar function appeared in the analysis of Flajolet and Odlyzko [18, Lemma 3].) 
By a simple modification of the proof in [18], we can check elementarily the implication 

(28) f M<i ^ / l<?MI<M 

\ < argu> < arccos(l/4) 1 < arg g(w) < argiu. 

(i) Weak upper bounds on modulus and bounds on argument. We are first going to use 

(28) and induction on h (with 1 < h < N(z)), in order to establish a suitably weakened 
form of (24); namely, 

(29) \e h /y\ < 1 and < axg(e h /y) <(h + l) argy. 

We start with the basis of the induction relative to (29), the case h = 1, where ei = 
y - z - z 2 . Observe that e 1 (p) = 1 - p - p 2 = 0.43, so that |ei(z)| < 1/2 (and, a 
fortiori, \ei/y\ < 1) is granted for z close enough to p. Write next z/p = 1 + re 10 , with 8 
close to 7r/2 and r a small positive number. Then, by virtue of the singular expansion (2) 
of Lemma 1, we have, 

y{z) = l + i\^re i9 > 2 + 0{r), 

as r — > 0, hence 

— = 1 - P - P 2 + i\(p + p 2 )e w/2 Vr + 0{r). 

y 

Since 8/2 now lies in (ir/4 — 7r/16,7r/4 + 7r/16), there results from the last expansion 
that the argument of e\/y is essentially a small positive multiple of s/r. A precise com- 
parison of the arguments of y and ei/y, as provided by the last two displayed equations, 
confirms (routine details omitted) that we can choose a small enough tq such that, in the 
sandclock 5(ro, 7r/8), we have both \ei/y\ < 1 and < arg(ei/y) < 2arg(y). 

Suppose now that (29) holds for all integers up to h < N(z). In order to determine 
whether it also holds for h + 1, we have to take into account the Pdlya term, that is, the 
second term in the right-hand side of (26). By possibly further restricting tq, we can 
guarantee that, for all z G S(r , n/8), this second term does not contribute any increase in 
the argument of e/j/y. Indeed, observe that for z g S(ro, it/8), we have arg(y) > Sr 1 / 2 , 
with some 6 > 0. In addition, by Lemma 5, Equation (22), we have arge/^z 2 ) < Co(h + 
l)\z 2 — p 2 \, so that \z 2 — p 2 \ = 0{r) is of a smaller order than 0(y/r). Thus, in (26), 
the second (Pdlya) term on the right hand side of the equality has an argument which is of 
order hr, and, for r small enough, may be taken to satisfy 

< arg(e, i (z 2 )/(2y)) < h/2 ■ arg(y). 

Now, the simple geometry of parallelograms implies that two complex numbers ( and 
whose arguments lie in [0, J], satisfy arg(£ + £') < max(arg(C), arg(£')). There results, 
from the induction hypothesis, the chain of inequalities 

< arg(e, l+ i/y) < max{arg(e^/y) + arg(y), arg(e /l (z 2 )/y)} 

< max{(/i + 1) arg y + arg(y)} 

< (/i + 2)arg(y). 

Note that the first inequality follows from the use of (28). In particular, this step requires 
that arg(e/j/y) be lower than arccos(l/ 4), which we can only garantee as long as our upper 
bound (h + 1) arg(y) is itself at most arccos(l/4). This is why we only proceed with the 
induction only as long as h < N(z). At this stage, the induction is complete and (29) is 
established for h < N(z). 
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(it) Improved upper bound on modulus. The upper bound on the modulus provided 
by (29), being (slightly) weaker than the upper bound on |e/,| asserted in (24), needs to 
be strengthened. The y(z) and eh(z) are analytic, hence continuous, in the domain T) 
of (15) and all the sandclooks it contains. Also, we have seen that e\(p) < 1/2, while, 
by definition, e\(p) > ei{p) > • • • ■ So, after possibly restricting r$ to a smaller value 
once more, for all z £ <S(ro,7r/8), the inequality |e/i(z)| < 1/2 is guaranteed to hold with 
h = 1 , . . . , 6, this by virtue of continuity. Next, if h > 6, the alternative recurrence and the 
fact that \eh/y\ < 1 (asserted in (29) and proved in Part (i) above) imply, via the triangle 
inequality 



< \y\Wh/y)\ 



e h (z 2 ) 



2y 



where g(e h /y) = — ■ ( 1 - 

y \ 2 



(g(w) is as defined in (27)). Now, for h < N(z), the quantity g(w) is taken at w — eu/y, 
which is such that \w\ < 1 and arg(iu) < arccos(l/4), so that, by (28), 



\e h+ i/y\ < \e 6 (z)/y\ + J • ^ h(z 2 )/y| 

(30) 



i=6 



2 

i=6 

for some constant K ; here the last line makes use of the inequality (z 2 ) | < (\z\ 2 /p) z 
granted by Lemma 2. It follows easily that, for h > 6, 

1 o 6 

\e h +x/y\ < |e 6 (p)| + r • ^— + 2K^¥, 

2 1 — p 

for all r < r , with r chosen small enough. In particular, for h £ [6, N(z)] and r < r 
small enough, we have \eh/y\ < 1/5. Combined with previous observatiosn regarding the 
initial values of ej(z), this implies the inequality |e^/y| < \ for all h < N(z), as asserted. 

(Hi) Lower bound on modulus. It finally remains to establish the lower bound on | \ in 
(24). We start with the recurrence relation (26). For h < N(z), the additional Polya term 
eh(z 2 ) only contributes to making |e/j +1 | larger. Indeed, for z € S(r , 9 ), by Lemmas 5 
and the upper bound on arguments proved in Part (i), both terms are such that, for h < 

N(z), 

\e h +i/y\ > \y\ ■ \e h /y\ ' ' 

Since x h-> x(1 — x/2) is increasing in [0, 1], we have \et/y\ > fh> > f° r a H h > 0, where 
the sequence (fh)h>o is defined by 

f l e °l \v~ z \ A f ii/. fh 

(The latter recurrence relation is precisely the one analysed by Flajolet and Odlyzko [18] 
in the case of simply generated trees.) For r$ small enough, a process analogous to the 
derivation of the alternative recurrence in Lemma 6 yields 

,„,s \v\ h 1 1 sr~^ , ,, 1 fi 1 3 v— I , „• „ 3/i 

(31) l = /o + 2-^ |y| + 2-^r^-^' ^^ + 2-^ |y| ^ 2 + y 

J/l ' /u i=0 i=0 J%l JV i=0 

This last last bound directly implies the lower bound on \eh \ asserted in (24). □ 
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3.2. Existence of a convergence sandclock. We can now turn to a proof of the main result 
of this section, Proposition 2 stated below, which establishes the existence of a sandclock 
in which the e/, converge to 0. This proof follows the lines of the analogous statement [18, 
Proposition 4], where the iteration is "pure". In the present context, we need once more to 
control the effect of the Polya terms, which can be done thanks to an easy auxiliary result, 
Lemma 7. 

Lemma 7. There exist ro, 9q > small enough, so that, for z € S(r§, 9q) and for all h 
satisfying < h < N, with N = N(z) as specified in (23), one has 

K(z 2 )\ < i (4> h 



\e h {zY\ ~ 2 V5 

Proof. Set z — p + re 10 , with z £ S(ro,9o), for ro,#o > taken small enough, which 
will be successively constrained, as the need arises. The inequality 



(32) M^<4(, + 1) 2 



\eh{z) 2 \ " y 



2 



z 2 



py 2 



< h < N, 



combines the upper bound on e^(z 2 ) provided by Lemma 2 (with z in the statement to be 
replaced by z 2 ) and the lower bound on eh(z) guaranteed by Lemma 6. 

Now, at z — p, the upper bound (32) takes the form 4(h + l) 2 p h , where p = 0.40269, so 
that its decay is about 0A h . By continuity of the exponential rate \z 2 / (py 2 )\, there exists 
a small sandclock such that the decrease is less than 4(h + 1) 2 0.45" (say). Furthermore, 
we verify easily that this last quantity is less than h ■ 0.8 h for all h > 13. Thus, the 
statement is established for h large enough (h > 13). On the other hand, examination of 
initial values shows that the ratio ej(p 2 )/ej(p) decreases rapidly from a value of about 
0.0543, at j = 0, to about 7.8 10~ 10 , at j = 12; furthermore, we observe numerically that 
2-0.8 _ - J ej(p 2 )/ej(p) is always less than 1/9, for j = ... , 12. Thus, by continuity again, 
in a small enough sandclock, we must have \ej (z 2 ) /ej (z)\ < | -OW for j = 0, . . . , 12. □ 

Proposition 2 (Convergence in a "sandclock" around p). There exist constants ro, 9q > 
such that the sequence {eh(z), h > 0} converges to zero for all z in the sandclock S (tq , #o)- 

Proof. It suffices to verify that, for h = N = N(z) as specified in Equation (23), the quan- 
tity e^v satisfies the convergence criterion of Lemma 3, which then grants us convergence 
of the ej to for j > N. For this purpose, we appeal to the alternative recurrence stated in 
Lemma 4 



e h -2yl-y + e ^ 2ef + 4y 2 { <± 

M * ^ 



1 



and devise an asymptotic lower bound for the right-hand side. (Observe that we can indeed 
use the relation since, by Lemmas 6 and 7, for all i = 0, . . . , N, the denominators do not 
vanish.) 

Write 1 — y(z) = ee lt . As in Lemma 6, we assume without loss of generality that 
> 0. We need to establish properties of the various quantities, which intervene 
in (33); this, in a small sandclock, that is, for small e > and t close to — ir/4. The 
following expansions are valid uniformly for t € [— tt/4 — 5, — tt/4 + 5] with < 5 < 7r/4 
when e — > 0: 



l-\y\ =ecost + 0(e 2 ), N(z) = -^/(esint) + O(l) 

arg(y) = _ es ini + 0(e 2 ), aM 1 - \y\ N = 1 - e^ cott + 0(e), 
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where ip := arccos(l/4). 

The first term M of the right-hand side of (33) will be seen to bring the main contribu- 
tion. It satisfies, as e —> 0: 



|M| 



1 1 



y 



N 



1 

2~M 



li 



2y l-y 
Next, regarding the term A, we have 

AT-l 

I 

\A\< 



> 



ii- \y 



N 



1 



D (/?-cot i 



2 II 



1 



y l e,(z 2 ) 



2y ^ e,(z)2 



i=0 



< 



1 

%j 



JV-l 



i=0 



2c 



e,(z 2 ) 



0(1), 



0(1), 



since, by Lemma 7, the summands decrease geometrically. 

It now remains to analyse B. We split it further: by Lemma 7, for all i such 18 < i < N, 
we have |ei(z 2 )|/|ej(2:) 2 | < 1/100. Then, by Lemma 6, and for e small enough, we obtain 



\B\ < 



1/5 



' 101 1 
. 100/ 



1 



\y 



N 



4(1 -e 
It follows that 



1 - 



1Z?_3 4(1 -e) 



2-2e 2 



1 _ V5 101 1 
± 2-2c 100 



< 22 



3 

50 



1 



\y 



N 



(35) 



\y N \ 1 — e v ' cot f 



> 



50 cos t 



0(1 ,4i 



where the last inequality holds for all z E T> such that e < eo an d \t + 7r/4| < So, as soon 
as both eo an d <^o are small enough. 

We can now return to the criterion for convergence (Lemma 3) and verify that in a small 
enough sandclock the conditions in (16) are satisfied for m = N and some well chosen 
parameters a and j3. Equation (35) provides the required upper bound on e^r, which fixes 
our choice for a: 



|ejv| < a := 



£ . e <p cot t 

^ g^-COt t 



We now focus on the second condition in Lemma 3. From (34) and (35) we have, for e > 
small enough, 



\y\ 



< i 



cost 



„(^-cot t 
^ gip-cot 



+ Q(e 2 



Next, one can verify that there exists <5o > such that for all t E [— tt/4 — Sq, — 7r/4 + 5q], 
we have 



cos t 



1 



gip-cott 2 



Thus, for all e > small enough, we can choose f3 E (0, 1), so that the first two conditions 
in Lemma 3 are satisfied; namely, 



(36) 



|ejv| < a 



e ■ e 



if cot t 



2 1 - ev-' 



and 



o 



1 



One then easily verifies that the third condition also holds for small enough e > 0: here, 
a(l - 0) = fl(e 2 ), and, by (34), we have (\z\ 2 /p) N = o(e 2 ). So for e small enough, 
(\z\ 2 / p) N < a(l — 0). This shows that the criterion for convergence of Lemma 3 is 
satisfied with values of a and f3 specified in (36). As a consequence, e^iz) — > for all z 
in a small enough sandclock. □ 
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4. Main approximation 

In this section, we develop precise quantitative estimates of e^(z) near the singularity p 
and in a sandclock; these estimates serve as the main ingredient required for developing 
limit laws for height in the next section. 

Proposition 3 (Main estimate for th in a sandclock). There exist ri,9\ > and K,K' > 
such that for all z € S(ri,9i) and all h > 1, 

(37) V — = l^-+R h (z), where \R h (z)\ < K mm (log —^r- log(l + h) 
e h 2 1-y L 1 — 12/1 

Furthermore, \R% — Rh+i\ < K'/h. 

In order to prove this proposition, we need a better control on error terms, which can 
be achieved by extending the bounds obtained in Section 3 for h > N, knowing now that 
the eh converge (Proposition 2). The proof requires the bounds to be uniform both in the 
distance to the singularity \z — p\ and in the height h, as expressed by Lemmas 8 and 9 
below. The bound (38) below serves as a useful complement to the lower bound in (24), 
only holds for h < N. 

Lemma 8 (Uniform lower bound for |e^|). For any 5 G (0, 1), there exist constants 
r±, 6\ > Q such that if z £ S(rx, 9±) then one has 

(38) \e h (z)\ > V 2(fe ^ 1} , farallh>0. 

Proof. Let 5 £ (0, 1). We have \y\ > 1 — 5/4 provided r := \z — p\ < r$ small enough. 
Then, by Lemma 6, the estimate (38) holds for h < N. 

We thus only need to consider now the case h > N. Assume further that z E S(ro, 9o), 
as in Proposition 2. Then, |e/j| < a, for a as in (36). The recurrence relation (8) implies 

nm , 1^.111 I f-, \ e h\\ \eh(z 2 )\ , ./ \ e h(z 2 )\ 

(39) \e h+1 \ > \y\ \e h \{l-—j- > \y\{l — j • \e h \ 

However, by (36), we have \y\ + a/2 < 1 so that \y\ — a/2 > 1 — 5/2. Lemma 2, which 
serves to bound the Polya term ^^(z 2 )!, then yields |e; l+ i| > (1 — <5 / 2 ) | e / t | — (p + r) h . 
Therefore dividing both side of the recurrence relation by (p+r) h+1 , we obtain for h > N, 



(40) 



(p + r)'^ 1 ~ V P + r ){p + r) h p + ■ 



The remainder of the proof then consists in extracting the desired bound (38) from the 
latter relation by unfolding the recurrence from h down to N. To this effect, recall that, by 
Lemma 6, | ejv | > y N+1 /(2{N + 1)) and N < K/y/r, for some constant K. Hence, we 
can set r to a value small enough that, 

(41) t¥w>-* and ~T~" > 1- 

{p + r)" p + r 

Then, for such r, using (40) and (41), it is easly verified by induction on h (with h > N) 
that \e h \/(p + r) h > 2/5. Using this last bound in (40) gives, for h> N: 



K+il > (1-S\ \e h \ > fl-5\ h+1 - N |, v 



(p + r) h+1 \p + r)(p + r) h \p + rj (p + r) 
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We can finally recover the information on \eh\ by means of the lower bound for |ejv| in 
Lemma 6. For all h > iV, we then have 

\e h \ > \e N \.(l-8) > 2{N+i) > 2{h + i) , 

and the proof is complete. □ 

We can now develop a uniform upper bound for |e^| when z € S[r\,Q\). 

Lemma 9 (Uniform upper bound for | (z) |). There exist constants r\ , 9\ > and c\ > 
such that, for any h > 1, ant/ z € 5(r 1; #i), we have 

\ek(z)\< C f. 

Proof. Write 1 — y = ee lt for some e > and t. It suffices to prove that the result holds 
for all such z provided e is small enough and t close enough to — 7r/4. Observe that e is of 
the order of y/z — p. 

Our starting point is again (33), which we now use to obtain an upper bound on \eh\. 
The first term M is such that 



11 -y h 



_ j,h | 1 In.l /l 1 \n,\h 

(42) |M| 



2y l-y 



I \l-y h \ > l-\y\ h l-\y\ 



2\y\ \l-y\ ~ 2\l-y\ 2e 



On the other hand, for all h > and e > small enough, by Lemmas 2 and 8, the first 
error term A in (33) satisfies 



'' 1 e,(, 2 ) 



1 1 



s H + 2(MS 4,i + 1,, i(i-f 



There exists ei > such that for all e < ei the geometric term in the series above is at 
most 2p < 1; together with the fact that eo = y — z = 1 — p + 0(e), this implies that 
14 < 11/(1 ~2p)\ 

We now bound the second error term B in (33). Note first that, for all e small enough, 
we have |e»| < 1/2 for all i > 0: for h < N, this is implied by Lemma 6, while for h > N 
we have |e/,| < a < 2(1 — |j/|) = 0(e). Furthermore, by Lemmas 2 and 8, for all e < ei 
small enough |ei(z 2 )|/|ei(z) 2 | < 1/100, for i > ho depending only on ei. It follows that 

\b\ < ^o + -- (101 ( 1 To ) 1 2 - E \vr 2 < 

9 8 1 l . 101 i» I — o u 1 c; i _ 

1 8 1 - 1 too t=ho+1 1 5 1 \y\ 

As a consequence, using Lemma 4, and combining the bounds just obtained on \A\ and 
\B\ with (42), one sees that, for all h > 0, 

(43)^>^^-^V^o- 7 ^>^^o 11 



2 5costy u (l-2p) 3 5e u (l-2p) 3 ' 

for |7r/4 + 1\ small enough. 

The relation above provides a decent upper bound on |e/j| provided that is small 
enough. With this in mind, we now prove an upper bound on \y h \ for all h > 0. First, 
when h is not too large, \y\ h should decrease at least linearly in h: we show that for some 
small enough S > 0, \y\ h < 1 — She for all h < N. For some fixed z, the sequence 
(\y\ h , h > 0) is convex; thus if \y\ N ' < 1 - SN'e for some AT' > N, then |y|' 1 < 1 - She 
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for all < h < N. Recall that <p = arccos(l/4); we now prove that we might take 
N' := -2</j/(esini). By (34), for e small enough, |y| < 1 - f cost and N < N' and 

|yr<(l + ^)"'<exp(- A ^)=e-- 

However, for \t + 7r/4| < 1/100, then e ^ cott < 1/2, so that we can pick 5 > such that 

e vcott < 1 + ^ = l-SN'e. 
sini 

It follows by (43) that there exists 6 > small enough such that \eh\ < 10/ (Sh), for 

< h < N', for all |tt/4 + t\ < 1/100 and e > small enough. 

On the other hand, if h > N and e > is small enough and \%/4 + t\< 1/100, then 

1 - \y\ h > 1 - 2ef cott > 1/4 by (34). As a consequence, 

\e h \ < i0e\y\ h < 40e(l - ecosi + C^e 2 ))' 1 < 40e(l - e/2) h , 

for all e small enough and t close enough to — 7r/4. Now, seen as a function of e, the 
maximum of the right-hand side above is obtained for e = 2/(h + 1), which implies that 
| eft. | < 80/ (h + 1), for h > N. Finally, by Lemma 6, and the bounds above, the result 
follows by choosing c\ = max{/i , 10/5, 80}. □ 

Proof of Proposition 3. The proof consists in using Lemma 9 above to bound the error 
terms in (33) for z £ S(r2,&2), with r2 = min{ro, r±} and 02 = min{6>o, For some 
constants C2 and C3, we have 



|A| + \B\ < + c 2 (l + E ^) < c 3 min {log (^) , 1 + log h] 



which proves the main statement of Proposition 3. Finally, since A and B are partial sums, 
we obtain 



\Rh — Rh+l\ 



2 e?l (z) 2 



2y 



e h {zf 



e h {zf 

a quantity which is easily seen to be uniformly 0(1/ h), thanks to Lemmas 8 and 9. □ 

5. Asymptotic analysis and distribution estimates 

The basis of our estimates relative to the distribution of height is the main approximation 
of eh in Proposition 3, which is valid in a fixed sandclock at p. Given its importance, we 
repeat it under the simplified form: 

(44) e h (z) = y(z) - y h (z) a 2^\y h . 

1 - y h 

(Here, the symbol is to be loosely interpreted in the sense of "approximately equal".) 
This approximation acquires a precise meaning, when z remains fixed and h tends to in- 
finity, in which case it expresses the geometric convergence of e/,(z) to (since \y\ < 1); 
also, when h remains fixed and z tends to p, it reduces to the numerical approximation 
eh(p) ~ 2/h, whose accuracy increases with increasing values of h. In other words, the 
precise version of (44) provided by Proposition 3 consistently covers, in a uniform manner, 
the case when both z — ^ p and h 00. (Analogues of the formula (44) surface in the case 
of general plane trees in [12], plane binary trees [18], and labelled Cayley trees in [38].) 

The exploitation of the enhanced versions of (44) relies on Cauchy's coefficient for- 
mula (10). The contour 7 in Cauchy's integral (10) will be comprised of several arcs and 



18 



NICOLAS BROUTIN AND PHILIPPE FLAJOLET 




line segments that lie outside of the disc \z\ < p and taken in the union of a suitable 
sandclock (as granted by Proposition 3) and of a tube, overlapping with the sandclock 
(where properties of Proposition 1 are in effect). The strategy just described belongs to 
the general orbit of singularity analysis methods expounded in [19, Ch. VI-VII]. We pro- 
pose to apply it to the height-related generating functions eh(z) (weak limit, Theorem 1) 
and e/ t _i(z) — e^(z) (local limit law, Theorem 2). 

Before proceeding with the proof of Theorem 1, recall that we aim at showing that for 
any fixed x > 0, we have 

lim F(H n > X^xy/n) = Q(x), X := y/2p + 2p 2 y'(p 2 ), 

n— too 

where G(x) := ^(feV - 2)e~ k2x2 / i . 

k>l 

Proof of Theorem 1. We aim at using Cauchy's formula (10) with a well-chosen 5 integra- 
tion contour 7. The reader should consult Figures 2 and 3. First, we choose a priori a 
sandclock S, whose existence is granted by Proposition 2 and such that the approximation 
properties of Proposition 3 hold. By design, this sandclock contains in its interior a small 
arc of the circle {\z\ = p}. Choose arbitrarily a point zq on this small arc, with zq 7^ p, 
$s(zq) > 0, and set z Q — r2 e l7r / 2 + I0n . Proposition 1 guarantees the existence of a tube 
T that has zq in its interior and for which the convergence e/j — > is ensured. We have 
now determined a sandclock and a partially overlapping tube, whose union will be seen to 
contain the contour 7 (where — > 0) and whose intersection contains z = r 2 e °. 



In order to have well-defined determinations of square roots, one may think of the two segments as in fact 
joined by an infinitesimal arc of a circle that passes to the left of the singularity p. 

5 It might be that none of the tubes corresponding to Proposition 1 includes points to the right of the vertical 
line 5R(z) = p, hence the need to insert "joins" 74 and 75. (The discussion of this case was inadvertently omitted 
from the earlier version [7].) An alternative would be to make use of a contour that is squeezed in between the 
circle \z\ = p and the vertical line R(z) = p (this is done in [38], where the circle itself is used); but then the near 
stationarity of the modulus of the Cauchy kernel, \z\~ n , makes it technically harder, or at least less transparent, 
to translate approximations of generating functions into coefficient estimates. 
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The contour 7 is essentially a Hankel contour escaping from p along rectilinear portions 
7! and 72 such that 



7i = 72 



where #x is chosen positive and strictly less than the half-angle of the sandclock S. By 
design, the segments 71 and 72 lie entirely inside the sandclock S. 
The component 73 of the contour is a subarc of the circle 



C n := {z : \z\ = p n }, where p n := p ( 1 



log 2 n 



Precisely, let z\ = z\ (n) be the intersection point in the upper half-plane of the circle C n 
and the circle {\z\ — r 2 }. When n gets large, this point Z\ comes closer and closer to z , 
so that, for all n large enough, it must belong to the intersection S n T ■ In other words, we 
can write 

zi= Pn e" 12 =p + r 2 e™' 2+w \ 
where 772 = f]2 {n) and 62 = #2 (n) both depend on n and tend to finite limits as n — >• +00 
(in particular, #2 — > #o)- Then we take 

73 = {p n e l6 : 9 G [ry 2) 2tt - r? 2 ]} , 

and for n large enough, the arc 73 entirely lies in the tube T. 

We can finally complete the contour to make it connected, with joining arcs 74 and 75, 
which are arcs of {\z — p\ = r 2 } defined by 

74 = 75 = {V T/2+ ' 9 :0eMi,e2]}, 
so that both arcs lie inside the sandclock S. 

Outer circular arc 73. By Proposition 1, we have e/j(z) — > uniformly on 73 as 
h — >• 00. In particular, all moduli |e^(z)| are bounded by an absolute 6 constant K. On the 
other hand the Cauchy kernel z~ n is small on the contour, so that 

< K x p- n exp (- log 2 n) 



(45) 



e-h[z) 



,n+l 



Join portions 74,75. By Proposition 2, one has e;,. — > uniformly on 74 U 75 as 
h —> 00. In particular |e/t(2)| < K2 for some absolute constant K^. By definition, for all 
z G 74 U 75, \z\ > p n so that, for the same reasons as in (45), 

- <K 3 p- n exp(-log 2 n). 



(46) 



dh{z) 



74U75 



,71+1 



Outer rectilinar parts of 71 ant/ 72- Let P„ := {\z — p| > i5 n }, with 



log 2 n 



Note that for z E 71 n 2? ra , we have |z| > p + S n sin 6\. For the same reason as before, 



(47) 



(7lU72)nX> r 



, . dz 

eh(z) rrr 

" K I -,n+l 



< K ip - n exp(-^ 4 log 2 n). 



The total contribution of the outer circular arc 73, of both join portions 74 and 75, and 
of the outer rectilinear parts 71 n T> n , 72 n T> n are thus exponentially small compared to y n , 
hence totally negligible in the present context. 



In what follows, we use generically K, K\, ... to denote absolute positive constants, not necessarily of the 
same value at different occurrences. 
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Inner rectilinear parts 0/71 and 72. This is where action takes place. From now on, we 
operate with the normalization 

h = \~ 1 x\/n, 

where x is taken to range over a fixed compact interval of M>o. We now focus on the 
portions of 71 and 72 lying outside T> n . We denote them by 71 and 72, respectively, and 
note that all their points are at a distance from p that tends to 0, as n — > +00. Our objective 
is to replace by the simpler quantity 



(48) 



e h (z) = e h :=2^-^y h , 
1 - ZT 



as suggested by Proposition 3. Along 71,72, the singular expansion of y(z) applies, so that 
1 — y — 0((logn)/n 1/2 ) and the error term Rh(z) from Equation (37) is O(logn). There 
results that (1 — y h )/(l — y) is always at least as large in modulus as K^^jnj log n (this, 
by a study of the variation of 1 1 — e~ hT \/\l — e~ T |), and we have 



(49) 



y_ = v_ 
eh eh 
It proves convenient to define 



1 + 



log 2 n 



(50) E(h,n) 
and to make the change of variables 

(51) z = p(l 



2l1T ./ 7l U7 2 



dz 

yn+l ■ 



dz 



-dt. 



With this rescaling, the point t then starts from — ip 1 n6 n e 1 x , loops to the right of the 
origin, then steers away to ip _1 n5„e* ei . Given the singular expansion of y(z) in (2), we 
have on the small arcs 71, 72, 



(52) 



= p- n eJ [1 + 



log 4 n 



y(z) = l-XJ-+0 



and, with h = X 1 x^/n and \t\ < Kq log 2 n, since 8 n — log 2 nj 



n: 



(53) y h = cxp ( -x\fi ) [1 + 



— exp 



-xVt) [1 + 



log 2 n 



We also find 7 , for the range of values of t corresponding to 71, 72: 



1 - exp(-xVt)(l + t/y/n) 
\Jtfn 



l + O 



log* n 



(54) 



1 — Cxp(—Xy/i) 



1 + 



l + O 
log* n 



log* n 



The approximations (52), (53), and (54) motivate considering, as an approximation 
of E(h, n) in (50), the contour integral 



(55) J(X) 



1 r cxp(-xVt) 

2«7r J c 1 - cxp(-X^) 



Vie t dt=— V / cxpi-kX^Vie* dt, 
k>i JL - 



7 The expression log* n represents an unspecified positive power of log n. 
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where £ goes from — oo + ioo to — oo — zoo and winds to the right of the origin. We now 
make J(X) explicit. Each integral on the right side of (55) can be evaluated by the change 
of variables w = i\ft, equivalently, t = —w 2 . By completing the square and flattening the 
image contour £' onto the real line, we obtain: 

(56) J(X) = ^=Ye- k2x2 / 4 (k 2 X 2 -2). 

v k>l 

From the chain of approximations in Equations (48) to (55), we are then led to expect 
the approximation 

E(h,n) ~ 2\p- n n- 3/2 J{x), 

which is justified next. 

Error management. In order to justify the replacement of eh by eh, following (49) 
and (50), we observe the estimate 



(57) 



h I 1 -y\ \ dz \ _ ^ ( „-n lo s 4ri 



^ 2 1 N" +1 



0[p 



,3/2 



This results from the discussion of the lower bound on (1— y h )/(l— y) that follows (48), the 
inequality \y h \ < 1, and the fact that the length of the integration interval is 0(log n/n). 
The error in our approximation has three sources: the two successive replacements 

(58) ^nll^ 
and the integration on a finite contour. We have, for z £ 71 U 72 : 

Finally, the infinite extension of the contour only entails an additive error term of the form 

0(exp(— K log 4 n)), since 



00 

e~ 

log 2 n 



~dw = O(e- log "). 



This implies, for h = A 1 x^/n\ 

e htn = [z n ]e h (z) = 2\p- n n-^ 2 J(x)+0^p- nl °^^ . 

The explicit form of J(X) in (56) and the asymptotic form of y n (Lemma 1) jointly yield 
the statement. □ 

The main message of the proof of Theorem 1 is twofold: (i) for any "reasonable" 
expression involving e/j, the estimation of the Cauchy coefficient formula can be limited to 
a small neighbourhood of p (parts 71 and 72), since the other parts of the contour 7 have 
exponentially negligible contributions; (it) the approximation provided by Proposition 3 
and Equation (37) is normally sufficient to derive first-order asymptotic estimates. 

The convergence in law expressed by Theorem 1 is illustrated by Figure 4. The proof 
of the theorem points to an error term, in the convergence to the limit, that is of the form 
0((log a n)/y/n), with an unspecified exponent a. As a matter of fact, the value a = 
1 is suggested by the logarithmic character of the error term in (37) of Proposition 3. 
Convergence is, at any rate, somewhat slow, a fact that is perceptible from Figure 4. 
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Figure 4. The normalized distribution functions V(H n < A x*Jn), for n 

10, 20, 50, 100, 200, 500, as a function of x, and the limit distribution function 1 — O(x), where 

Q(x) is specified in Theorem 1. 



On an other register, the distribution function 1 — (x) belongs to the category of elliptic 



theta functions [42, Ch. XXI], which are of the rough form 2~2l k e 2tkz and are well- 
known to satisfy transformation formulae [42, p. 475]. Regarding Q(x), such formulae 
provide an alternative form, which we state for the density function, d(x) := —Q'(x): 



(59) 



0(3?) 



Theorem 2 states that the H n indeed satisfies a local limit law with density function 
$( • ): for x in a compact set of R >0 and h = \^ 1 x^Jn an integer, there holds uniformly 



F(H n = ft) 



A 



where 0(a) = -0'(x) = (2a;)" 1 ^(k i x i - 6k 2 x 2 )e 



k 2 x 2 /4. 



k>l 



Proof of Theorem 2. We abbreviate the discussion, since it is technically very similar to 
the proof of Theorem 1 : only the approximations near z = p differ. Proceeding in this 
way, based on Proposition 3, we can justify approximating the number of trees of height 
exactly h and size n by the integral 



1 

2«7T ./^Lfyj 



dz 

yn+l 



with eh as defined in (48). We have 



(60) 



eh-x - eh = 2y 



ft— i 



The approximations (53) and (54) then motivate considering the quantity 

1 f exp(-X-\A) 



MX) 



2tn J c (1 - cxp(-X^)) 2 



te f dt . 



' Do q = e x / 4 and z = to recover 9(x). 
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One then finds (with the auxiliary estimate — Rh+i \ — 0((log* n) I \pn) provided by 
Proposition 3): 

Vn.h ~ VnM+i = 2X 2 p- n n- 2 J 1 (x) + O [p~ n ^j^ 

On the other hand, differentiation under the integral sign yields Ji(X) = —J'[X), which 
proves the statement. □ 

Figure 5 displays the normalized histograms of the distribution of height and a plot of 
the corresponding limit density. 

Revisiting the proof of Theorems 1 and 2 shows that one can allow x to become ei- 
ther small or large, albeit to a limited extent. Indeed, it can be checked, for instance, that 
allowing x to get as large as 0(\/\ogn) only introduces extra powers of log n in error esti- 
mates. However, such extensions are limited by the fact that the main theta term eventually 
becomes smaller than the error term. We state (compare with [20, Th. 1.1]): 

Theorem 4 (Moderate deviations). There exist constants A,B,C > such that for h = 
(x/X)-\/n with A/^/log n < x < A^/logn and n large enough, there holds 

(61) \¥(H n > X^x^n) -Q(x)\ < —5. 

1 1 n a 

In particular, if x — > 00 in such a way that x < A^/logn, then, uniformly, 

¥{H n > \~ 1 x^/n) - x 2 e- x2/4 . 

Similar estimates hold for the local law. These estimates can furthermore be supple- 
mented by (very) large deviation estimates in the style of [20, Th.1.4]: 

Theorem 5 (Very large deviations). There exists a continuous increasing function /(it) 
satisfying I{u) > for < u < 1 and such that, given any fixed S > 0, one has for 
all x G [5, 1 — S] and all n 

¥(H n > xn) < Kn 3/2 e-' nI{x \ 

where K only depends on 6. 
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Proof. We propose to use saddle point bounds [19, p. 246]: for any r £ (0, p), one has 
(62) P{fl„>M= — <~ ( eh( - r) 

Vn Vn 

The first step is to obtain an upper bound on e/j(r), for r £ (0, p). For such r, all terms 
in the recurrence relation (8) are non-negative and expanding the relation with the help of 
Lemma 2 yields, for all h > 0, the inequality 

e h+1 (r)<y(r)e h (r)+^ < y(r) h fa ) + ei (r)^. 

However, it is easily verified that for all r £ (0, p), we have y(r) > r + r 2 + r 3 > r 2 /p. 
As a consequence, the series above converges and there exists a universal constant K such 
that 

eh(r) <Ky(r) h , for h > and r £ (0, p). 
The last estimate, the saddle point bound (62), and Lemma 1 yield, in the region h = xn, 

for some other universal constant K' and for any r £ (0, p). 

The goal is now to make an optimal choice of the value of r. For x kept fixed and 
regarded as a parameter, we consider 

j, s y{r) x 

J{r,x) := 

r 

as a function of r, and henceforth abbreviated as J(r). We have J(0) = +oo and J(p) = 
p^ 1 . The point, to be justified shortly, is that J(r) decreases from +oo to some minimal 
value J(£), when r — £; then it increases again to p^ 1 for r £ (£,/?)■ In particular, we 
must have J(^) < p^ 1 , which suffices to imply a non-trivial exponential bound on the 
probabilities. 

The unimodality of J(r) = J(r, x) results from the usual convexity properties of gen- 
erating functions (see [13] or [19, pp. 250 and 580]). Indeed it suffices to observe that the 
logarithmic derivative (all derivatives being taken with respect to r), namely, 

J'(r) (ry'{r) 1' 



'{H n >h}<K'n 3 / 2 (y(ry 



J{r) \ y(r) x / 

varies monotonically from x — 1 < to +oo, as r varies from to p. This last fact is a 
consequence of the positivity of 

d ( ry'ir) 1 



dr \ y(r) x t 

itself granted, since V = rv is the variance of a random variable X with probability 
generating function E [u x ] = J(ru)/ J(r). 

In summary, from the preceding considerations, the system 

I{x) = x\ogy((;) — log£ + logp with £ = £(x) such that x£y'(£) — y(£) = 

uniquely determines a function I(x), which precisely satisfies the properties asserted in 
Theorem 5. □ 

Finally, the approximation of eh by eh in (48) is good enough to grant us access to 
moments (cf also [18]) stated in Theorem 3: as n — > oo, we have 

E [H n ] - | yfirn and E[H*] - r(r - l)C(r)T(r/2) (^) n r/2 , r > 2. 
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Proof of Theorem 3. The problem reduces to estimating generating functions of the form 

M r (z) = 2(l-,) 2 £^ 

h>\ V y ' 

which are accessible to the Mellin transform technology [21], upon setting y = e _t . If we 
let F r (t) = Ylh>i h r it e -ht 7 then the Mellin transform F*(s) is given by 

F r *( S ) = C(s-r)C(*-l)r( S ), 

and is valid in the fundamental strip s > r+1. The information relative to the distribution is 
concentrated around the singularity, hence for values of y such that y —> 1, or equivalently 
t — > 0. The asymptotics of F r (t) as t — > correspond to the singular expansion of its 
Mellin transform F*(s) to the left of the strip. 

For r > 2, the main contribution is due to the simple pole at s = r + 1, which has 
residue ((r)T(r + 1). It follows that 

F r (t) ~ t(r)T{r + l)<-' r ~ 1 r > 2. 

Since 1 — y ~ A d\ — z/p, and y = e~ l : , we have i ~ Aa/1 — z/p and 

M r (z) - 2((r)r(r + l)A- r+1 (l - ^/p)-^ 1 )/ 2 . 
Singularity analysis theorems imply 



]M r (^)~2C(r)A- r+i r(r + l)/7 



-(r+l)/2 
i '* 

r((r-l)/2)' 



The duplication formula for the Gamma function, combined with the estimate for y n , then 
yields: 

E in:} - — , ^ ry ~ J C(r)r(r - l)r(r/2)n r / 2 r > 2. 



MM „ (*) c(r)r(r _ i )r( r/2K/ 2 



When r = 1, the Mellin transform F*(s) has a double pole at s = 2 and the asymptotic 
form of F r (t) at zero involves logarithmic terms. We then obtain, as n —> oo, 

E[ff n ] - 2A" 1 V^ 

using similar arguments □ 

6. The diameter of unrooted trees 

In this section, we put to use the approximations of Section 4 in order to quantify ex- 
treme distances in random unrooted trees. Developments parallel those of Riordan [39], as 
regards formal generating functions, and especially Szekeres [41], as regards asymptotic 
developments. 

In the class y of rooted binary trees, every node has total degree three or one, except 
for the root, which has degree two. Consider now the class U of unrooted ternary trees 
where each node has degree either three or one, without exception (no special root node is 
now distinguished). Let U n be comprised of the elements of U with n nodes of degree one, 
the leaves, which determine size, hence (n — 2) nodes of degree three. Denote by u n the 
number of such trees. The trees of U of size at most 8 are displayed in Figure 6. We write 
the generating function of U as u(z) :— Y^ n >o so tnat 

u{z) = z 2 + z 3 + z 4 + z 5 + 2z 6 + 2z 7 + 4z 8 + 6z 9 + llz 10 + 18Z 11 + • • ■ , 

and the coefficients constitute sequence A000672 of Sloane's On-line Encyclopedia of 
Integer Sequences [40]. 
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Figure 6. The unlabelled trees of sizes from 2 to 8, with external nodes (leaves) represented by 
squares. 



Using considerations about the dissimilarity characteristic of trees found in Otter's 
work [35] and developed in [5, 25], we obtain 

(63) u(z) = z 2 + u'(z) - l -y(zf + l -y{z 2 ), 

where u'{z) is the generating function of unrooted trees with a distinguished node. (Note 
that because of the special degree condition in rooted trees u'(z) ^ y(z)-) The distin- 
guished node might be a leaf or a node of degree three, which leads to 

(64) u'(z) = zy(z) + ^y(z) 3 + ^y(z 2 )y(z) + ^y(z 3 ). 

The equations (63) and (64) fully characterize u(z) and, together with Lemma 1, they 
determine the singular expansion of u{z). The following classical lemma reduces to simple 
manipulations based on Lemma 1, supplemented by routine singularity analysis of the 
generating function. 

Lemma 10. The generating function u{z) of unrooted ternary trees expands in a neigh- 
bourhood of p as follows 

u(z) = p Q + - zip) + ^A 3 (l - z/pf' 2 + 0((1 - z/p) 2 ), 

for some constants Ho, (J,i £ K and A = \/2p + 2p 2 y'(p 2 ). Furthermore, the number u n 
of unrooted trees of size n satisfies the asymptotic estimate 

We now turn to the analysis of the diameter of unrooted trees. A diameter in a graph or 
a tree is any simple path of maximal length and we also refer to the common length of all 
diameters as the diameter of the tree. Let Ud, n be the number of unrooted trees on n leaves 
with diameter exactly equal to d, and let Ud(z) = X)n>o Wd >" 2 ™ denote the associated 
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generating function 9 . To simplify notations, we set 

g h (z) := e h -i(z) ~ e h (z), 

which is the generating function of rooted unlabelled binary trees having height exactly h. 

We have u\(z) = z 2 and un{z) = z 3 . Unrooted trees of size at least 4 may be recur- 
sively decomposed into sets of rooted trees; the decomposition depends on the parity of the 
diameter d. If d = 2h + 1 is odd, with d > 3, all diameters share a unique edge (bicentre) 
that splits the tree into a pair of two rooted trees of height exactly h each, so that 

(65) u 2h+ i{z) = ^gh(z) 2 + ^9h(z 2 ). 

On the other hand, trees with even diameter d = 2h, with d > 4, decompose into three 
rooted trees around a central vertex (center), with two of the trees of height exactly h and 
a third subtree of height at most h: 

u 2 h{z) = ^9h-i{zf + ^gh-x(z 2 )gh-i(z) + ^gh-i{z 3 ) 

(66) + ^gh-i{z) 2 y h -2(z) + ^gh-i{z 2 )y h -2(z). 

In this way, one can enumerate the trees of odd and even diameter (the "bicentred" and 
"centred" trees), whose generating functions start, respectively, as 

u odd (z) = z 2 + z 4 + z 6 + +z 7 + 2z 8 + 2z 9 +6z 10 + 8Z 11 + ••• 
u cvcn (z) = z 3 + z 5 + z 6 + z 7 + 2z s + 4z 9 + bz w + lOz 11 H , 

with coefficients forming sequences A000673 and A000675 of Sloane's Encyclopedia. 

We now turn to singular asymptotics in a A-domain 10 (see [19, §VI.3] and Equation (5), 
and Figure 3). As usual, the Pdlya terms in (65), (66), which are the ones containing func- 
tional terms involving z 2 or z 3 , will turn out to be of negligible effect. Indeed, Lemma 2 
guarantees, for \z\ < yfp: 

i f\ z r h 



(67) 9h(z 2 ) < e h _i(^) < , 

y/i-l\/) 

Thus, fixing some R with p < R < ^fp, we have for some C > and A € (0, 1): 

(68) \g h (z 2 )\ <C-A h , 

whenever z lies in a suitable A-domain anchored at p, and the same bound on the right 
of (68) obviously holds for gh{z 3 ). In other words, the Polya terms involving z 2 and z 3 
are exponentially small. This gives us, relative to (65) and (66) and for z in a A-domain, 
the estimate 

(69) u 2h+1 (z) = ^g h (z) 2 + 0(A h ) 
and, similarly, 

(70) u 2h (z) - lg h -i(z) 2 y h . 2 (z) + \g h - x {zf + 0(A h ). 

2 b 



= [z n ]u(z) is the number of trees of size n; we make use 
of indices d, 2h, 2h + 1 for diameter and occasionally abbreviate u^{z), . . ., as u^, . . ., so that no ambiguity 
should occur. 

To be precise, we only need to consider the part of a A-domain that is interior to a 7-contour of the type 
introduced in the previous section. 
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Figure 7. Left: the raw histograms of the distribution of diameter in unrooted trees, for n = 
50, 100, 150, . . . , 500. Right: a plot of the limit density function 6(x) of Theorem 6. 



The latter asymptotic form may be further simplified: by Lemmas 1 and 9, for z — > p in a 
sandclock, we have 

y-e h = l- 0(y/l-z/p) - e h , \ 9h \ < \e h -!\ = 0(l/h), 

and it follows that, in this sandclock, 

(71) u 2h {z) = i ffh _!(z) 2 (l + 0(l/h) + 0(y/l - z/p)) . 

(The cubic term ^g^-iiz) 3 essentially corresponds to trees having a centre from which 
there spring three trees of equal height; such configurations are still negligible, but now 
polynomially, rather than exponentially.) Additionally, in a tube, all terms in (69) and (70) 
are exponentially small, by virtue of Equation (17) of Lemma 3 and Proposition 1; the 
induced contributions for coefficients are thus going to be exponentially small, and we do 
not need to discuss these any further. 

In a way similar to the asymptotic simplification (60) of — e^+i = 9h+i, the estimates 
of (69) and (71) now suggest to introduce the following approximation of ug, 

( 72 ) S d :=2(l-y) 4 (i _^ /2)4 , 
regardless of the the parity of d: we have (in a sandclock) 

(73) u d =u d (l + 0{l/d) + 0(y/l - z/pfj . 

Following the line of proof of Theorems 1, 2, and 3, it is now a routine matter to work 
out the consequences, at the level of coefficients, of the main approximations (72) and (73). 
Note that, since we have access to generating functions of diameter exactly h, we start with 
a local limit law, then proceed to estimate the distribution function by summation. Figure 7 
presents supporting numerical data for the local limit law of diameter. 

Theorem 6 (Local limit law for diameter). The diameter D n of an unrooted tree sampled 
from U n uniformly at random satisfies a local limit law: for x in any compact set o/M > o, 
uniformly, with (xjX)^Jn an integer, one has: 

lim ¥{D n = (x/\)y/n\ = -^$(x) 
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where d(x) = — ^ k(k 2 - l)(k 5 x 5 - 80fc 3 x 3 + 960kx)e~ k2x2 /16 . 

k>l 

Proof. We start from the approximations (72) and (73), then make use of Cauchy's coef- 
ficient formula together with the contour 7 specified in the proof of Theorem 1 . As noted 
already, the contributions of the outer circle 73, the joins 74 and 75 and the further portions 
of the rectilinear pieces and 72 are exponentially small, so that we can restrict attention 
to what happens in a small sandclock, along 71 and 72. 

The change of variable z — p(l — t/n) and approximations that are justified in Equa- 
tions (50) to (54) of the proof of Theorem 1 lead to 

[z n ]u d {z) = -2p- n n' 3 \ 4 J 2 (Xx/2) + 0(p- n n- J / 2 log*n), 
where we have set 

J 2 (X) := / — — iVdt, 

V ; 2itt J c (l - e -xVt)4 

with C that goes from —00 + ioo to — 00 — ioo and winds to the right of the origin. As in 
Equations (55) and (56), we can make J 2 (X) explicit: 

M X) = -Lj2 k{k ~ 1 l ik ~ 2) I e- Xik - 1)ViM t 



(74) 

fe>3 



2 v2 J 4 



= -—£-= k (k 2 ~ l)(fc 5 ^ 5 - 20fc 3 X 3 + 60)e- k2x . 

192 V n fc>2 

A normalization by u n , as provided by Lemma 10, then yields the claim. □ 

Theorem 7 (Limit distribution of diameter). The diameter D n of a unrooted tree sampled 
from lA n uniformly at random admits a limit distribution: for x in a compact set ofR > Q, 
we have 

lim P{D„ > (x/X)Vn} = Q(x), 



where Q(x) 



r 00 ~ 1 

/ d(w) dw=— J2(k 2 ~ l)(fcV - A8k 2 x 2 + 192)e 
Jx 96 k>i 



Proof. The convergence of distribution functions results from earlier approximations through 
integration. Indeed, approximating a Riemann sum by the corresponding integral, we find, 

for d = Xy/n, 

[z n ] J2 U <?~ [*"] ~ -2^ 4 P™« _3/2 / J 2 (As/2)ds, 

£>d £>d Jx 

as ft — y 00. The integral is easily computed from (74): write X = As/2 to obtain 

e kxVi t 3/2 e t dt 



1 ^2(k 2 - l)(k 4 X 4 - l2k 2 X 2 + I2)e- k2x2 '\ 



3-2 4 ^ 

v fe>i 

A final normalization based on Lemma 10 yields the result. □ 

Theorem 8 (Moments of diameter). The moments of the diameter D n of a random un- 
rooted tree with n leaves satisfy 

E [D n ] ~ E [Dl] ~ ^ (l + y) n E [£ 3 ] ~ 
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and, for all r > 3, 

E [Dnl ~ 2 ^r{r - l)(r - 3)I\r/2)(C(r - 2) - C(r))A-^/ 2 . 
Proof. By definition, the moments of D n are given by 
(75) E[D r n ] = —[z n \Y^d r u d {z), 

and, from (72) and (73) once more, we are led to the approximation E [D^] ~ ( [z™] M r ) / u n , 
where 

M r (z) :=2(l-,) 4 ]>> yd 

d>l V y 7 

results from replacing Ud by Ud in the generating function of (75). 

As for the moments of height, the singular asymptotic form of M r (z) is conveniently 
determined by means of the Mellin transform technology. Set y = e~ T , so that z — > p 
corresponds to r — > 0, with r ~ A-^/l — z/p. We then need the asymptotic estimation of 
M r {e~ T ) when r 0. Define 

-dr 



e 

\4 4, 



which is such that M r (z) ~ 2A t F r (r). By the "harmonic sum rule" [21], the Mellin 
transform F*(s) of F r (r) satisfies, for 5R(s) > max{l + r, 4}, 

= f C(s - r)(C( s ~ 3) - c(* - i))r( s ). 

The singularities in a right half-plane are known to dictate the asymptotic expansion of 
F r (r), as t — y 0. For r > 3, the main contribution comes from a simple pole at s = r + 1 
(due to the factor ((s — r)), and we find 

^•(r) ~ ^(C(r - 2) - C(r))r(r + l)^ 1 , r -> 0, 
6 

which provides in turn the main term in the expansion of M r (z) as z — > p: 

M r (z) ~ |A- r (C(r - 2) - C(r)) F(r + 1) (1 - z/p)^ r+1 , « -> p. 

Singularity analysis combined with the estimate of w n in Lemma 10 and the duplication 
formula for the Gamma-function then automatically yields the asymptotic form of E [Z?, r J, 
in the case r > 3. 

For r < 3, the approach is similar, but a little more care is required. For r — 1,2 
one needs to consider the second terms of the singular expansion of F*(s), at s = 2 and 
s = 3, respectively. Also, the cases r = 1 and r = 3 involve logarithmic terms due to 
double poles of F*(s) and F£(s) at s = 2 and s = 4. The claim follows by routine Mellin 
technology and singularity analysis. □ 

7. Conclusion 

We finally conclude with two corollaries and a general comment. First, as a byproduct 
of (72) and (73), via summation and singularity analysis, we can estimate the proportion 
of centred and bicentred trees. 
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Figure 8. A table comparing the asymptotic forms of the expectations of several parameters of trees, 
for the two models of Cayley trees (non-plane labelled trees) and Otter trees (non-plane unlabelled 
binary trees), based on [33, 38, 41] and the present paper. Depth refers to the depth of a randomly 
chosen node in the tree; height is the maximum distance of any node from the root; diameter is 
relative to the unrooted version of the trees under consideration. 



Corollary 1. There are asymptotically as many centred trees (trees of even diameter) as 
bicentred trees ( trees of odd diameter): 

[z n ]u odd (z) ~ [z n ]u cvcn (z) ~ lun. 

This perhaps unsurprising observation parallels one made by Szekeres [41, p. 394] in the 
case of labelled trees, where all degrees are allowed. 

Next, a comparison of expectations of height and diameter in random nonplane trees 
shows the following. 

Corollary 2. The ratio of the expected diameter of an unrooted tree and the expected 
height of a rooted tree of the same size satisfies asymptotically 

ii E [- P "] = ! 

n^L E [H n ] 3 ' 

Again, a similar observation was made by Szekeres [41, p. 396] regarding labelled trees 
and the same property, with a "universal" | factor is expected to hold for any "ordered" 
tree family (i.e., trees whose nodes have neighbours that are distinguishable; cf our Intro- 
duction), as argued heuristically by Aldous in [3]. 

The fact, established rigorously in the present paper (Theorems 1 to 8 and Corollaries 1, 
2), is that, up to scaling, height and diameter behave for some non-plane unlabelled trees 
similarly to what is known for ordered trees: see Figure 8 for some striking data. This 
brings further evidence for the hypothesis that probabilistic models, such as the Contin- 
uum Random Tree, may be applicable to unordered trees — this has indeed been recently 
confirmed, in the binary case at least, by Marckert and Miermont [32]. It is piquant to note 
that the probabilistic approach of [32] relies in part on large deviation estimates for height, 
which were developed analytically by us in the earlier conference version [7] of the present 
paper. (Recently, Haas and Miermont [24] have developped an alternative approach that 
further allows them to prove the convergence of a large class of trees towards continuum 
limits. This encompasses a self-contained proof of the result in [32] and other more exam- 
ples with stable tree limits.) An analytic treatment of the height of unordered trees with 
all degrees allowed has been given recently by Drmota and Gittenberger (see [15] and the 
account in [14]). Together with the present study, it confirms, among unordered trees, the 
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existence of universal phenomena regarding height and profile, which parallel what has 
been known for a long time regarding their ordered counterparts. As usual, the analytic 
approach advocated in the present paper has the advantage of providing precise estimates, 
with speed of convergence estimates, local limit laws, and convergence of moments. 

Finally, the fact that, up to a possible linear change of scale, some of the main char- 
acteristics of trees, such as height and diameter, are not sensitive to whether trees are 
planar (ordered) or not, is also of some relevance to the emerging field of "probabilis- 
tic logic" [29, 31]. For instance, there is interest there in determining the probability 
of satisfiability of random boolean formulae obeying various randomness models (see, 
e.g., [10, 23]). In this context, our results suggest that the commutativity of logical conjunc- 
tion and disjunction (reflected by the non-planarity of associated expression trees) should 
not, in many cases, have a major effect on complexity properties of random Boolean ex- 
pressions. 
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